Network visualization using output from text model
Data preparation
Import data. The data is obtained from the voting mechanism after we applied three embedding methods to the definitions of each indicator, where we’ve kept the pair of indicator and its related indicators if the similarity score meets the designed cutoff.
edgelist <- read.csv("~/Documents/GitHub/G5055_Practicum_Project2/Data/Text_Model_Data/edgelist.csv")
edgelist
For future classification of indicators into the goals they belong to, create the nodes dataframe:
nodes <- edgelist %>%
select(indicator, related_indicator)
nodes <- data.frame(indicatorname = unlist(nodes, use.names = FALSE))
nodes <- distinct(nodes)
nodes$goal <- stri_match_first_regex(nodes$indicator, "(.*?)\\.")[,2]
nodes$goal <-as.numeric(nodes$goal)
nodes
Visualization
In the network graph below, the size of each vertices (each indicator) represents the number of related indicators that are connected to it. The width of the edges linking each indicator is determined according to the similarity score between each pair of related indicators. The indicators are grouped according to the goals they belong to, which are denoted by different colors of the vertices.
g<-graph_from_data_frame(edgelist, directed=FALSE, vertices=nodes)
#Add attributes
E(g)$weight<-E(g)$similarity_score
V(g)$in_degree<-degree(g, mode="in")
colrs<-c("#ea1d2d", "#d19f2a","#2d9a47", "#c22033","#ef412a", "#00add9", "#fdb714", "#8f1838", "#f36e24", "#e01a83", "#f99d25", "#cd8b2a", "#48773c", "#007dbb", "#40ae49", "#00558a", "#1a3668")
V(g)$color<-colrs[V(g)$goal]
#Plot graph
plot(g, vertex.label=NA, edge.color="gray77", vertex.color=V(g)$color, vertex.size=V(g)$in_degree, edge.width=E(g)$weight*10, layout=layout_nicely(g))

plot(g, vertex.label.color="black", vertex.label.cex=2.5, edge.color="gray77", vertex.color=V(g)$color, vertex.size=V(g)$in_degree, edge.width=E(g)$weight*10, layout=layout_nicely(g))

#legend(x=-11, y=-11, c("Goal 1","Goal 2","Goal 3","Goal 4","Goal 5","Goal 6","Goal 7","Goal 8","Goal 9","Goal 10","Goal 11", "Goal 12","Goal 13", "Goal 14", "Goal 15", "Goal 16", "Goal 17"), pch=20, col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
Key takeaways
By looking at the visualization, some of the goals that have most connections with other goals include Goal 17, Goal 1, Goal 15, Goal 9, etc.
For goals such as Goal 17 and Goal 1, they have large representative vertices (indicators) that have connection with many other indicators.
For goals such as Goal 15 and Goal 9, each of their vertices sizes is relatively small, but the number of indicators under the category that are connected to other indicators is relatively high, so that they are also easy to spot on the graph.
Indicators such as 5.c.1-5.4.1-4.a.1, 8.4.2-12.2.1-8.4.1-12.2.2 have much larger similarity scores than most other indicators by looking at edge width, suggesting a very close connection.
Network visualization using output from the social network model
Data preparation
Data for Indonesia and Guatemala used in this section could be obtained from GitHub as well. Specifically, after downloading indicator data from API, disaggregation measurements are eliminated and the data is processed into pivot format.
For each indicator, the measurements within each indicator, as well as the indicators themselves are deleted if they only contain one year of actual data. If the indicators are eligible to be kept, process them into a separate file with years as rows and measurements of the indicator as columns.
Linear regression method is used for data imputation. Once the datasets without missing value are created, PCA is adopted to reduct dimensionality so that each indicator have one value.
Correlation is then calculated and a social network model is applied. The end result file in the link includes a coefficient between each pair of indicators.
Indonesia
edgelistindo <- read.csv("~/Documents/GitHub/G5055_Practicum_Project2/Data/PCA_results/indo_coefficients.csv")
#Some preprocessing
edgelistindo<-edgelistindo%>%
select(Var1, Var2, abs)%>%
filter(Var1!=Var2)
edgelistindo
For future classification of indicators into the goals they belong to, create the nodes dataframe:
indonodes <- edgelistindo %>%
select(Var1, Var2)
indonodes <- data.frame(indicatorname = unlist(indonodes, use.names = FALSE))
indonodes <- distinct(indonodes)
indonodes$goal <- stri_match_first_regex(indonodes$indicator, "(.*?)\\.")[,2]
indonodes$goal <-as.numeric(indonodes$goal)
indonodes
Guatemala
edgelistguate <- read.csv("~/Documents/GitHub/G5055_Practicum_Project2/Data/PCA_results/gua_coefficients.csv")
#Some preprocessing
edgelistguate<-edgelistguate%>%
select(Var1, Var2, abs)%>%
filter(Var1!=Var2)
edgelistguate
guatenodes <- edgelistguate %>%
select(Var1, Var2)
guatenodes <- data.frame(indicatorname = unlist(guatenodes, use.names = FALSE))
guatenodes <- distinct(guatenodes)
guatenodes$goal <- stri_match_first_regex(guatenodes$indicator, "(.*?)\\.")[,2]
guatenodes$goal <-as.numeric(guatenodes$goal)
guatenodes
Visualization
Indonesia
g2<-graph_from_data_frame(edgelistindo, directed=FALSE, vertices=indonodes)
#Add attributes
E(g2)$weight<-E(g2)$abs
V(g2)$in_degree<-degree(g2, mode="in")
colrs<-c("#ea1d2d", "#d19f2a","#2d9a47", "#c22033","#ef412a", "#00add9", "#fdb714", "#8f1838", "#f36e24", "#e01a83", "#f99d25", "#cd8b2a", "#48773c", "#007dbb", "#40ae49", "#00558a", "#1a3668")
V(g2)$color<-colrs[V(g2)$goal]
#Plot graph
plot(g2, vertex.label=NA, edge.color="gray77", vertex.color=V(g2)$color, vertex.size=V(g2)$in_degree/10, edge.width=E(g2)$weight, layout=layout_nicely(g2))

plot(g2, vertex.label.color="black", vertex.label.cex=2.5, edge.color="gray77", vertex.color=V(g2)$color, vertex.size=V(g2)$in_degree/10, edge.width=E(g2)$weight, layout=layout_nicely(g2))

Guatemala
g3<-graph_from_data_frame(edgelistguate, directed=FALSE, vertices=guatenodes)
#Add attributes
E(g3)$weight<-E(g3)$abs
V(g3)$in_degree<-degree(g3, mode="in")
colrs<-c("#ea1d2d", "#d19f2a","#2d9a47", "#c22033","#ef412a", "#00add9", "#fdb714", "#8f1838", "#f36e24", "#e01a83", "#f99d25", "#cd8b2a", "#48773c", "#007dbb", "#40ae49", "#00558a", "#1a3668")
V(g3)$color<-colrs[V(g3)$goal]
#Plot graph
plot(g3, vertex.label=NA, edge.color="gray77", vertex.color=V(g3)$color, vertex.size=V(g3)$in_degree/10, edge.width=E(g3)$weight, layout=layout_nicely(g3))
